Perceptual Hashing
   HOME

TheInfoList



OR:

Perceptual hashing is the use of a
fingerprinting algorithm In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item (such as a computer file) to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purpos ...
that produces a snippet, hash, or
fingerprint A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfac ...
of various forms of
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradition ...
. A perceptual hash is a type of locality-sensitive hash, which is analogous if
features Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software item ...
of the multimedia are similar. This is not to be confused with
cryptographic hashing A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output re ...
, which relies on the
avalanche effect In cryptography, the avalanche effect is the desirable property of cryptographic algorithms, typically block ciphers and cryptographic hash functions, wherein if an input is changed slightly (for example, flipping a single bit), the output changes ...
of a small change in input value creating a drastic change in output value. Perceptual hash functions are widely used in finding cases of online
copyright infringement Copyright infringement (at times referred to as piracy) is the use of works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the copyright holder, s ...
as well as in
digital forensics Digital forensics (sometimes known as digital forensic science) is a branch of forensic science encompassing the recovery, investigation, examination and analysis of material found in digital devices, often in relation to mobile devices and co ...
because of the ability to have a correlation between hashes so similar data can be found (for instance with a differing
watermark A watermark is an identifying image or pattern in paper that appears as various shades of lightness/darkness when viewed by transmitted light (or when viewed by reflected light, atop a dark background), caused by thickness or density variations ...
).


Development

The 1980 work of Marr and Hildreth is a seminal paper in this field. The July 2010 thesis of Christoph Zauner is a well-written introduction to the topic. In June 2016 Azadeh Amir Asgari published work on robust image hash spoofing. Asgari notes that perceptual hash function like any other algorithm is prone to errors. Researchers remarked in December 2017 that
Google image search Google Images (previously Google Image Search) is a search engine owned by Google that allows users to search the World Wide Web for images. It was introduced on July 12, 2001 due to a demand for pictures of the green Versace dress of Jennifer ...
is based on a perceptual hash. In research published in November 2021 investigators focused on a manipulated image of
Stacey Abrams Stacey Yvonne Abrams (; born December 9, 1973) is an American politician, lawyer, voting rights activist, and author who served in the Georgia House of Representatives from 2007 to 2017, serving as minority leader from 2011 to 2017. A member ...
which was published to the internet prior to her loss in the 2018 Georgia gubernatorial election. They found that the pHash algorithm was vulnerable to nefarious actors.


Characteristics

Research reported in January 2019 at Northumbria University has shown for video it can be used to simultaneously identify similar contents for
video copy detection Video copy detection is the process of detecting illegally copied videos by analyzing them and comparing them to original content. The goal of this process is to protect a video creator's intellectual property. History Indyk et al. produced a ...
and detect malicious manipulations for video authentication. The system proposed performs better than current
video hash Video fingerprinting or video hashing are a class of dimension reduction techniques in which a system identifies, extracts, and then summarizes characteristic components of a video as a unique or a set of multiple perceptual hashes, enabling tha ...
ing techniques in terms of both identification and authentication. Research reported in May 2020 by the
University of Houston The University of Houston (UH) is a Public university, public research university in Houston, Texas. Founded in 1927, UH is a member of the University of Houston System and the List of universities in Texas by enrollment, university in Texas ...
in deep learning based perceptual hashing for audio has shown better performance than traditional
audio fingerprinting An acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database. Practical uses of a ...
methods for the detection of similar/copied audio subject to transformations. In addition to its uses in digital forensics, research by a Russian group reported in 2019 has shown that perceptual hashing can be applied to a wide variety of situations. Similar to comparing images for copyright infringement, the group found that it could be used to compare and match images in a database. Their proposed algorithm proved to be not only effective, but more efficient than the standard means of database image searching. A Chinese team reported in July 2019 that they had discovered a perceptual hash for
speech encryption Secure voice (alternatively secure speech or ciphony) is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or IP. History The implementation of voice encryption dat ...
which proved to be effective. They were able to create a system in which the encryption was not only more accurate, but more compact as well.
Apple Inc Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling in 2021) and, as of June 2022, is the world's biggest company ...
reported as early as August 2021 a
Child Sexual Abuse Material Child pornography (also called CP, child sexual abuse material, CSAM, child porn, or kiddie porn) is pornography that unlawfully exploits children for sexual stimulation. It may be produced with the direct involvement or sexual assault of a ch ...
(CSAM) system that they know as NeuralHash. A technical summary document, which nicely explains the system with copious diagrams and example photographs, offers that "Instead of scanning images
n corporate N, or n, is the fourteenth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''en'' (pronounced ), plural ''ens''. History ...
iCloud iCloud is a cloud service from Apple Inc. launched on October 12, 2011 as a successor to MobileMe. , the service had an estimated 850 million users, up from 782 million users in 2016. iCloud enables users to sync their data to the cloud, inclu ...
ervers the system performs on-device matching using a database of known CSAM image hashes provided by he_National_Center_for_Missing_and_Exploited_Children.html" ;"title="National Center for Missing and Exploited Children">he National Center for Missing and Exploited Children">National Center for Missing and Exploited Children">he National Center for Missing and Exploited Children(NCMEC) and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices." In an essay entitled "The Problem With Perceptual Hashes", Oliver Kuederle produces a startling collision generated by a piece of commercial neural net software, of the NeuralHash type. A photographic portrait of a real woman (Adobe Stock #221271979) reduces through the test algorithm to the same hash as the photograph of a piece of abstract art (from the "deposit photos" database). Both sample images are in commercial databases. Kuederle is concerned with collisions like this. "These cases will be manually reviewed. That is, according to Apple, an Apple employee will then look at your (flagged) pictures... Perceptual hashes are messy. When such algorithms are used to detect criminal activities, especially at Apple scale, many innocent people can potentially face serious problems... Needless to say, I’m quite worried about this." Researchers have continued to publish a comprehensive analysis entitled "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash", in which they investigate the vulnerability of NeuralHash as a representative of deep perceptual hashing algorithms to various attacks. Their results show that hash collisions between different images can be achieved with minor changes applied to the images. According to the authors, these results demonstrate the real chance of such attacks and enable the flagging and possible prosecution of innocent users. They also state that the detection of illegal material can easily be avoided, and the system be outsmarted by simple image transformations, such as provided by free-to-use image editors. The authors assume their results to apply to other deep perceptual hashing algorithms as well, questioning their overall effectiveness and functionality in applications such as client-side scanning and chat controls.


Gallery

Töölö Malminkadulta, Helsinki 1907.jpg Leppäsuo - N234 (hkm.HKMS000005-000000or).jpg, comparison with pHash checksum


See also

*
Geometric hashing In computer science, geometric hashing is a method for efficiently finding two-dimensional objects represented by discrete points that have undergone an affine transformation, though extensions exist to other object representations and transformat ...
*
Reverse image search Reverse image search is a content-based image retrieval (CBIR) query technique that involves providing the CBIR system with a sample image that it will then base its search upon; in terms of information retrieval, the sample image is very usefu ...
*
Digital video fingerprinting Video fingerprinting or video hashing are a class of dimension reduction techniques in which a system identifies, extracts, and then summarizes characteristic components of a video as a unique or a set of multiple perceptual hashes, enabling tha ...
*
Audio fingerprinting An acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database. Practical uses of a ...


References


External links


pHash - an open source perceptual hash library

Blockhash.io - an open standard for perceptual hashes

Insight - a perceptual hash tutorial
Hashing
Images An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
Image search {{algorithm-stub